-
Notifications
You must be signed in to change notification settings - Fork 56
Making sure summarization leverages KV cache #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making sure summarization leverages KV cache #152
Conversation
4e2b5ae to
7a33fa9
Compare
|
@claude review this PR. |
|
Claude finished @kovtcharov-amd's task —— View job PR Review: KV Cache for Summarization
Overall Assessment: 🟢 Looks good with minor suggestions The PR correctly addresses the TTFT issue by switching from What's Good
Issues Found🟡 History pollution - The Suggestion: Consider using chat_response = self.send(
summary_prompt,
no_history=True, # Don't add to history
max_tokens=min(self.config.max_tokens, 2048),
timeout=1200,
)This would be cleaner since you're clearing and replacing history anyway. However, using 🟢 Minor - Line 843: The VerificationThe logic flow is:
This correctly leaves only the summary in history after completion. Approve with minor suggestion - The code is functional and achieves the stated goal. The |
|
@claude your guess at the end is correct, if we set no_history=True we would not have the history and we wouldn't be able to summarize the context. Are their any other Medium priority issues with this PR? |
|
Claude finished @eddierichter-amd's task —— View job Re-reviewing PR for Medium Priority Issues
Thanks for confirming the Additional Findings: After re-examining the PR, I don't see any other medium priority issues. The changes are clean and correct. Here's my assessment of the remaining minor items: 🟢 Minor (already mentioned): Line 843 has a defensive check 🟢 Minor: The prompt at line 823-824 now says "You have full access to the prior conversation history above" - this wording is clear and appropriate for the new flow. No other issues found. The implementation correctly:
The PR is good to merge from a code quality perspective. |
|
@kovtcharov-amd this has passed Claude's review. Let me know what you think. This is a really nice quality of life improvement as it reduces the time of summarization from over a minute to a couple seconds given. |
The initial implementation of summarization used the .generate() method in the ChatSDK which bypassed sending previously stored messages to the LLM. It therefore needed to send the entire prompt to the agent which was formatted slightly differently and bypassed the KV cache and had to be processed. This resulted in very high TTFT for summarization. This change reduces the TTFT of summarization to be similar to other requests.